Finite-state Methods for Multimodal Parsing and Integration

نویسندگان

  • Michael Johnston
  • Srinivas Bangalore
چکیده

Finite-state machines have been extensively applied to many aspects of language processing including, speech recognition (Pereira and Riley, 1997; Riccardi et al., 1996), phonology (Kaplan and Kay, 1994; Kartunnen, 1991), morphology (Koskenniemi, 1984), chunking (Abney, 1991; Joshi and Hopely, 1997; Bangalore, 1997), parsing (Roche, 1999), and machine translation (Bangalore and Riccardi, 2000). In Johnston and Bangalore (2000) we showed how finite-state methods can be employed in a new and different task parsing, integration, and understanding of multimodal input. Our approach addresses the particular case of multimodal input to a mobile device where the modes are speech and gestures made on the display with a pen, but has far broader application. The approach uses a multimodal grammar specification which is compiled into a finite-state device running on three tapes. This device takes as input a speech stream and a gesture stream and outputs their combined meaning. The approach overcomes the computational complexity of unification-based approaches to multimodal processing (Johnston, 1998), enables tighter coupling with speech recognition, and enables straightforward composition with other kinds of language processing such as finite-state translation (Bangalore and Riccardi, 2000). In this paper, we present a revised and updated finitestate model for multimodal language processing which incorporates a number of significant advancements to our approach. We show how gesture symbols can be decomposed into attributes in order to reduce the alphabet of gesture symbols and enable underspecification of required gestures. We present a new mechanism for abstracting over gestural content that cannot be captured in the finitestate machine. We address the problems relating to deictic numerals (Johnston, 2000) by introducing a new mechanism for aggregation of adjacent gestures. The examples we use are drawn from a new more sophisticated multimodal application which provides mobile access to city information such as the locations of restaurants and theatres (we will demonstrate this application as part of our presentation). We will also draw examples from other applications as needed. In addition to addressing multimodal rather than unimodal input, another novel aspect of our approach is that we used the finite-state representation to build the meaning representation. We first present the basics of the finite-state approach and then go on to discuss each of the innovations in turn.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite-state Multimodal Parsing and Understanding

Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a multidimensional chart parser to compose inputs. This approach is highly expressive and supports a b...

متن کامل

Multimodal Language Processing For

Interfaces for mobile information access need to allow users flexibility in their choice of modes and interaction style in accordance with their preferences, the task at hand, and their physical and social environment. This paper describes the approach to multimodal language processing in MATCH (Multimodal Access To City Help), a mobile multimodal speech-pen interface to restaurant and subway i...

متن کامل

Unification-based Multimodal Parsing

In order to realize their full potential, multimodal systems need to support not just input from multiple modes, but also synchronized integration of modes. Johnston et al (1997) model this integration using a unification operation over typed feature structures. This is an effective solution for a broad class of systems, but limits multimodal utterances to combinations of a single spoken phrase...

متن کامل

A Modular Approach to Turkish Noun Compounding: The Integration of a Finite-State Model

In this paper, we describe the design and integration of a three level cascaded non-deterministic finite state model of Turkish compounding into Turkish PAPPI, a comprehensive syntactic parser in the principles-andparameters(P&P) framework. Our approach is to handle compounding as an intermediate stage between morphological analysis and syntactic parsing. We discuss how the compounding machine ...

متن کامل

A Toolkit for Creating and Testing Multimodal Interface Designs

Designing and implementing applications that can handle multiple recognition-based interaction technologies such as speech and gesture inputs is a difficult task. IMBuilder and MEngine are the two components of a new toolkit for rapidly creating and testing multimodal interface designs. First, an interaction model is specified in the form of a collection of finite state machines, using a simple...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001